Exploring CxE metadata and weather data

This is the primary notebook accompanying the manuscript: "A case-study for improved reusability of plant phenotyping data with MIAPPE". The manuscript references sections numbered below.

Available files (in the data-generated folder and subfolders:

0. Setup and helper functions:

1. Exploring the Investigation and Study metadata

1.1. Investigation properties

We can start with the top MIAPPE object, the Investigation, and examine the properties it has:

1.2. Investigation property values

Next, we can check the values of those properties. For those that are not plain literals, we also choose to show their class here (if given):

1.3. Study properties

We can choose to take a look at studies, the next object in the MIAPPE hierarchy, and look at some of their properties.

2. Verifying that the experiments have overlapping genotype sets

We can easily find the genotypes (biological materials) that were part of all 5 studies:

We can also calculate the total number of genotypes, i.e. the ones that were studied in at least one experiment:

3. Checking if there is weather data available for the location of interest

3.1. Retrieve weather stations and their GPS coordinates from SPARQL endpoint

3.2. Retrieve and plot sun hours for the dates of the Netherlands experiment

Plot (hours of sunlight):

3.3. Compare distances between experimental locations and weather station coordinates

4. Focusing on a trait and plotting it for all experiments, for each genotype

In this section we read in the data files for each study. We can view the variables in a data file for a study as follows:

4.1. Netherlands 1999 data

Create the data table for this study. We will also add a column to hold the genotypes for later use.

(The code for putting SPARQLWrapper output into a dataframe is based on this page)

4.2. Venezuela 2003 data:

Once again, when building the data table, we will also add a column to hold the genotypes for later use.

4.3. Ethiopia 2010 data

The data for Ethiopia is sparse, so constructing the full table with SPARQL is inconvenient. We only get the tuber weight per plant:

At the same time, we only have the average tuber weight per genotype for the Netherlands and Venezuela. We can calculate the same thing for Ethiopia and Finland. (new table datatable_2010ET2)

4.4. Finland 2004 data

The data for Finland is sparse, so constructing the full table with SPARQL is inconvenient. We only get the tuber weight per plant, for block 8 (the one that was harvested last):

Calculating the average tuber weight per genotype:

4.4.1. Get plant height data

4.5. Finland 2005 data

The data for Finland is sparse, so constructing the full table with SPARQL is inconvenient. We only get the tuber weight per plant:

Calculating the average tuber weight per genotype:

5. Compare the genotypes that overlap in all experiments

6. Compare all genotypes in all experiments

6.1. Scatter plot

6.2. Stacked bar graph

7. Making plots combining this trait with the weather information.

7.1. Get weather data for all locations

For Netherlands 1999 (df_weather_1999NL):

For Venezuela 2003 (df_weather_2003VE):

For Finland 2004 (df_weather_2004Fin):

For Finland 2005 (df_weather_2005Fin):

For Ethiopia 2010 (df_weather_2010ET):

7.2. Calculate (cumulative) photo-beta thermal time

Beta thermal days: $\Large g \left ( T \right) = \left [ \left (\frac{T_c - T}{T_c-T_0} \right ) \cdot \left (\frac{T - T_b}{T_0-T_b} \right ) ^{\frac{T_0 - T_b}{T_c - T_0}} \right ]^{c_t}$

7.3. Find location with best and worst performance for each genotype